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A HEARING PROSTHESIS WITH AUTOMATIC CLASSIFICATION OF THE LISTENING 
ENVIRONMENT 

FIELD OF THE INVENTION 

' 5 

The present invention relates to a hearing prosthesis and method providing automatic 
identification or classification of a listening environment by applying one or several 
predetermined Hidden Markov Models to process acoustic signals obtained from the 
listening environment. The hearing prosthesis may utilise determined classification results 
10 to control parameter values of a predetermined signal processing algorithm or to control a 
switching between different pre-set listening programs so as to optimally adapt the signal 
processing of the hearing prosthesis to a given listening environment. 

BACKGROUND OF THE INVENTION 

15 

Today's digitally controlled or Digital Signal Processing (DSP) hearing instruments are 
often provided with a number of pre-set listening programs. These pre-set listening 
programs are often included to accommodate a comfortable and intelligible reproduced 
sound quality in differing listening environments. Audio signals obtained from these 

20 listening environments may have highly different characteristics, e.g. in terms of average 
and maximum sound pressure levels (SPLsj and/or frequency content. Therefore, for 
DSP based hearing prosthesis, each type of listening environment may require a 
particular setting of algorithm parameters of a signal processing algorithm of the hearing 
prosthesis to ensure that the user is provided with an optimum reproduced signal quality 

25 in all types of listening environments. Algorithm parameters that typically could be 

adjusted from one listening program to another include parameters related to broadband 
gain, comer frequencies or slopes of frequency-selective filter algorithms and parameters 
controlling e.g. knee-points and compression ratios of Automatic Gain Control (AGC) 
algorithms. Consequently, today's DSP based hearing aids are usually provided with a 

30 number of different pre-set listening programs, each tailored to a particular listening 
environment and/or particular user preferences. Characteristics of these pre-set listening 
programs are typically determined during an initial fitting session in a dispenser's office 
and programmed into the aid by transmitting or activating corresponding algorithms and 
algorithm parameters to a non-volatile memory area of the hearing prosthesis. 

35 
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The hearing aid user is subsequently left with the task of manually selecting, typically by 
actuating a push-button on the hearing aid or a program button on a remote control, 
between the pre-set listening programs in accordance with the current listening or sound 
environment Accordingly, when attending and leaving the multitude of sound 
5 environments in his/hers daily whereabouts, the hearing aid user may have to devote his 
attention to the delivered sound quality and continuously search for the best program . 
setting in terms of comfortable sound quality and/or the best speech intelligibility. 

It would therefore be highly desirable to provide a hearing prosthesis such as a hearing 
10 aid or cochlea implant device that was capable of automatically classifying the user's 
current listening environment so as to belong to one of a number of typical everyday 
listening environments. Thereafter, classification results could be utilised in the hearing 
prosthesis to adjust the algorithm parameters of the current listening program, or to switch 
to another more suitable pre-set listening program, to maintain optimum sound quality 
1 5 and/or speech intelligibility for the individual hearing aid user. 

In the past there have been made attempts to adapt signal processing characteristics of a 
hearing aid to the type of listening environment that the user is situated in. US 5,687,241 
discloses a multi-channel DSP based hearing instrument that utilises continuous 

20 determination or calculation of one or several percentile value of input signal amplitude 
distributions to discriminate between speech and noise input signals in the listening 
environment. Gain values in the frequency channels are subsequently altered in response 
to the detected levels of speech and noise. However, it is often desirable to discriminate 
between subtle characteristics of the input signal of the hearing aid not just between 

25 speech and noise. As an example, it may be desirable to switch between an omni- 
directional and a directional microphone listening program in dependence of, not just the 
level of background noise, but also on further signal characteristics of this background 
noise. In situations where the user of the hearing prosthesis communicates with another 
individual in the presence of the background noise, it would be beneficial if it was possible 

30 to identify and classify the type of background noise. Omni-directional operation could be 
selected in the event that the noise being traffic noise to allow the user to clearly hear 
. approaching traffic independent of its direction of arrival. If, on the other hand, the 
background noise was classified as being babble-noise, the directional listening program 
could be selected to allow the user to obtain a reproduced signal with improved signal to 

35 noise ratio during a communication with the other individual. 
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Such a detailed characterisation of an input signal from a listening environment may be 
obtained by applying Hidden Markov Models for analysis and classification of the input 
signal. Hidden Markov Models are capable of modelling stochastic input signals in terms 
5 of both short and long time temporal variations rather than just being restricted to 
modelling long term amplitude distribution statistics or average power. Hidden Markov 
Models are well known in the field of speech recognition as a tool for modelling statistical 
properties of stochastic speech signals. The article 0 A Tutorial on Hidden Markov Models 
and Selected Applications in Speech Recognition", published in Proceedings of the IEEE, 
10 VOL 77, No.2, February 1989 contains a comprehensive description of the application of 
Hidden Markov Models to problems in speech recognition. 

The present applicants have, however, for the first time applied Hidden Markov Models to 
a task of classifying the listening environment of a hearing prosthesis to provide automatic 
15 adjustment of one or several parameters) of a predetermined signal processing algorithm 
executed in processing means of the hearing prosthesis in dependence of these 
classification results. 

SUMMARY OF THE INVENTION 

20 

One object of the invention is to provide a hearing prosthesis that automatically adjusts 
itself to a surrounding listening environment by controlling one or several algorithm 
parameters of a predetermined signal processing algorithm to allow a user to 
automatically obtain intelligible and comfortable amplified sound in variety of different 
25 listening environments. 

It is another object of the invention provide a hearing prosthesis that continuously and 
automatically classifies an input signal as belonging to one of several everyday listening 
environments and indicates the classification results to processing means to allow the 
30 latter to perform the above-mentioned control of the algorithm parameters. 



35 
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DESCRIPTION OF THE INVENTION 

A first aspect of the invention relates to a hearing prosthesis comprising a microphone 
adapted to generate an input signal in response to receiving an acoustic signal from a 
5 listening environment, 

an output transducer for converting a processed output signal into an electrical or an 
acoustic output signal, 

10 processing means adapted to process the input signal in accordance with a 

predetermined signal processing algorithm and related algorithm parameters to generate 
the processed output signal, 

a memory area storing values of the related algorithm parameters for the predetermined 
15 processing algorithm, 

the processing means being further adapted to: 

segment the input signal into consecutive signal frames of time duration, T frame , and 

20 generate respective feature vectors, 0(t), representing predetermined signal features of 
the consecutive signal frames, 

process the feature vectors with at least one Hidden Markov Model, 
go™ = ft 0 ™' , b(o(t)), flf), associated with a predetermined sound source to 
25 determine an element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment, 

control one or several values of the related algorithm parameters in dependence of 
element value(s) of the classification vector. Thereby, characteristics of the predetermined 
30 signal processing algorithm are adapted to the current listening environment- The at least 
one Hidden Markov Model (HMM) comprising: 



A source_ A state ^sWoto prob ability matrix; 
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b(0(t)) = Probability function for the input observation 0(t) for each state of the at least 

one Hidden Markov Model; 

flj^s An initial state probability distribution vector. 

5 The hearing prosthesis may be a hearing instrument or aid such as a Behind The Ear 
(BTE), an In The Ear (ITE) or Completely In the Canal (CIC) hearing aid. The input signal 
generated by the microphone may be an analogue signal or a digital signal in a multi-bit 
format or in single bit format generated by a microphone amplifier/buffer or an integrated 
analogue-to-digital converter, respectively. Preferably, the input signal to the processing 

10 means is provided as a digital input signal. Therefore, in case the microphone signal is 
provided in analogue form, it is preferably converted into a corresponding digital input 
signal by a suitable analogue-to-digital converter (A/D converter) which may be included 
in an integrated circuit of the hearing prosthesis. The microphone signal may be subjected 
to various signal processing operations such as amplification and bandwidth limiting 

15 before being applied to the AID converter and other operations afterwards such as 
decimation before the digital input signal is applied to the processing means. 

The output transducer that converts the processed output signal into an acoustic or 
electrical signal or signals may be a conventional hearing aid speaker often called a 
20 "receiver" or another sound pressure transducer producing a perceivable acoustic signal 
to the user of the hearing prosthesis. The output transducer may also comprise a number 
of electrodes that may be operatively connected to the user's auditory nerve or neives. 

In the present specification and claims the term "predetermined signal processing 
25 algorithm" designates any processing algorithm, executed by the processing means of the 

hearing prosthesis, that generates the processed output signal from the input signal. 

Accordingly, the "predetermined signal processing algorithm" may comprise a plurality of 

sub-algorithms or sub-routines that each performs a particular subtask in the 

predetermined signal processing algorithm. As an example, the predetermined signal 
30 processing algorithm may comprise different signal processing sub-routines such as 

frequency selective filtering, single or multi-channel compression, adaptive feedback 

cancellation, speech detection and noise reduction, etc. 

Furthermore, several distinct selections of the above-mentioned signal processing sub- 
35 routines may be grouped together to form two, three or more different pre-set listening 
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programs which the user may be able to select between in accordance with his/hers 
preferences. 

The predetermined signal processing algorithm will have one or several related algorithm 
5 parameters. These algorithm parameters can usually be divided into a number of smaller 
parameters sets, where each such algorithm parameter set is related to a particular part 
of the predetermined signal processing algorithm or to particular sub-routine as explained 
above. These parameter sets control certain characteristics of their respective subroutines 
such as corner-frequencies and slopes of filters, compression thresholds and ratios of 
1 0 compressor algorithms, adaptation rates and probe signal characteristics of adaptive 
feedback cancellation algorithms, etc. 

Values of the algorithm parameters are preferably intermediately stored in a volatile data 
memory area of the processing means such as a data RAM area during execution of the 
15 predetermined signal processing algorithm. Initial values of the algorithm parameters are 
stored in a non-volatile memory area such as an EEPROM/Flash memory area or battery 
backed-up RAM memory area to allow these algorithm parameters to be retained during 
power supply interruptions, usually caused by the user's removal or replacement of the 
hearing aid's battery or manipulation of an ON/OFF switch. 

20 

The processing means may comprise one or several processors and its/their associated 
memory circuitry. The processor may be constituted by a fixed point or floating point 
Digital Signal Processor (DSP) with a single or dual MAC architecture that performs both 
the calculations required in the predetermined signal processing algorithm as well a 
25 number of so-called household tasks such as monitoring and reading values of external 
interface signals and programming ports. Alternatively, the processing means may 
comprise a DSP that performs number crunching, i.e. multiplication, addition, division, etc. 
while a commercially available, or even proprietary, microprocessor kernel handles the 
household tasks which mostly involve logic operations and decision making. 

30 

The DSP may be a software programmable type executing the predetermined signal 
processing algorithm in accordance with instructions stored in an associated program 
RAM area. A data RAM area integrated with the processing means may store initial and 
intermediate values of the related algorithm parameters and other data variables during 
35 execution of the predetermined signal processing algorithm as well as various other 
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household variables. Such a software programmable DSP may be advantageous for 
some applications due to the possibility of rapidly implementing and testing modifications 
of the predetermined signal processing algorithm. Clearly, the same advantages apply to 
sub-routines that handle the household tasks. Alternatively, the processing means may be 
5 constituted by a hard-wired DSP core so as to execute one or several fixed predetermined 
signal processing algorithm(s) in accordance with a fixed set of instructions from an 
associated logic controller. In this type of hard-wired processor architecture, the memory 
area storing values of the related algorithm parameters may be provided in the form of a 
register file or as a RAM area if the number of algorithm parameters justifies the latter 
10 solution. 

According to the invention, the processing means are further adapted to segment the 
input signal into consecutive signal frames of duration T frame and generate respective 

feature vectors, 0(t) , representing predetermined signal features of the consecutive 

15 signal frames. The feature vectors are subsequently processed with at least one Hidden 

Markov Model, *~ = \A SOUK %b{o{t)\a^} t associated with a predetermined sound 

source to determine element value(s) of a classification vector. This classification vector 
indicates a probability of the predetermined sound source being active in the current 
listening environment. By controlling one or several values of the algorithm parameters 

20 related to the predetermined signal processing algorithm in dependence of element 
value(s) of the classification vector, the processing of the input signal is adapted to the 
listening environment in dependence of these element value(s). The consecutive signal 
frames may be non-overlapping or overlapping with a predetermined amount of overlap, 
e.g. overlapping with between 10 % - 50 % to avoid sharp discontinuities at boundaries 

25 between neighbouring signal frames and/or counteract window effects of any applied 
window function, such as a Hanning window, at the boundaries. While the above- 
mentioned frame segmentation of the input signal is required for the purpose of 
generating the feature vectors, 0(t) , and process these with the at least one Hidden 

Markov Model, the predetermined signal processing algorithm may process the input 
30 signal on a sample-by-sample basis or on a frame-by-frame basis with a frame time equal 
to or different from . 

The at least one Hidden Markov Model may comprise at least one discrete Hidden 
Markov Model, X ource = >B"™ ,cC™}> wherein B sourte is an observation symbol 
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probability distribution matrix which serves as a discrete equivalent of the general 
function, b(o(t)), defining the probability function for the input observation 0(t) for each 
state of a Hidden Markov Model. In this discrete case, the processing means are 
preferably adapted to compare each of the respective feature vectors, 0(t) t with a feature 
5 vector set, often denoted a "codebook", to determine, for substantially each of the feature 
vectors, an associated symbol value so as to generate an observation sequence of 
symbol values associated with the consecutive signal frames. This process of determining 
symbol values from the feature vectors is commonly referred to as "vector quantization". 
Thereafter, the observation sequence of symbol values is processed with the at least one 
10 discrete Hidden Markov Model, X omx , which is associated with the predetermined sound 
source to determine the element value(s) of the classification vector. 

According to a preferred embodiment of the invention, the processing means are adapted 
to process the feature vectors with a plurality of Hidden Markov Models, or process the 

15 observation sequence of symbol values with a plurality of discrete Hidden Markov Models. 
Each of the discrete Hidden Markov Models or each of the Hidden Markov Models is 
preferably associated with a respective predetermined sound source to determine the 
element values of the classification vector. Each element value may directly represent a 
probability (i.e. a value between 0 and 1) of the associated predetermined sound source 

20 being active in the current listening environment. 

The duration of one of the signal frames, T frame , is preferably selected to be within the 

range 1-100 milliseconds, such as about 5-10 milliseconds. Such time duration allow 
the applied Hidden Markov Model(s) to operate on time scales of the input signal that are 
25 comparable to individual features, e.g. phonemes, of speech signals and on envelope 
modulations of a number of relevant acoustic noise sources. 

A predetermined sound source may be any natural or synthetic sound source such as a 
natural speech source, a telephone speech source, a traffic noise source, multi-talker or 

30 babble source, subway noise source, transient noise source or a wind noise source. A 
predetermined sound source may also be constituted by a mixture of a natural speech 
and/or traffic noise and/or or babble mixed together in a predetermined proportions to e.g. 
create a particular signal to noise ratio(snr) in that predetermined sound source. For 
* example, a predetermined sound source may be speech and babble mixed In a proportion 

35 that creates a particular target snr such as 5 dB or 1 0 dB or more preferably 20 dB. The 
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Hidden Markov Model associated with such a mixed speech-babble sound source will 
then through the classification vector be able indicate how well a current input signal or 
signals fit this speech-babble sound source. The processing means can consequently 
select appropriate signal processing parameters based on both the interfering noise type 
5 and the actual signal to noise ratio. 

Temporal and spectral characteristics of each of these predetermined sound sources may 
have been obtained based on real-life recordings of one or several representative sound 
sources. The temporal and spectral characteristics for each type of predetermined sound 

1 0 source are preferably obtained by performing reaMife recording of a number of such 
representative sound sources and concatenate these recordings in a single recording (or 
sound file). For speech sound sources, the present inventors have found that utilising 
about 10 different speakers, preferably 5 males and 5 females, will generally provide good 
classification results in the Hidden Markov Model associated with the speech source. The 

15 mixed sound source type is preferably provided by post-processing of one or several of 
the real-life recordings to obtain desired specific characteristics of the mixed sound source 
such as a predetermined signal to noise ratio. 

When the concatenated sound source recording has been formed, feature vectors, 
20 preferably identical to those feature vectors that are generated by the processor means in 
the hearing prosthesis, are extracted from the concatenated sound source recording to 
form a training observation sequence for the associated continuous or discrete HMM. The 
duration of the training sequence depends on the type of sound source, but it has been 
found that a duration of about 3-20 minutes, such as about 4-6 minutes is adequate for 
25 many types of sound sources including speech sound sources. Thereafter, for each 
predetermined sound source, the corresponding HMM is trained with the generated 
training observation sequence, preferably, by the Baum-Welch iterative algorithm to 
obtain values of, A sourcc t the state transition probability matrix, values for B source t the 
observation symbol probability distribution matrix (for discrete HMM models) and values of 
30 a$ 0Mrcc , the initial state probability distribution vector. If the HMM is ergodic, the values of 

the initial state probability distribution vector are determined from the state transition 
probability matrix. 



35 



The feature vectors that are generated from the consecutive signal frames may represent 
spectral properties of the signal frames, temporal properties of the signal frame or any 
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combination of these. The spectral properties may be expressed in the form of Discrete 
Fourier Transform coefficients, Linear Predictive Coding parameters, cepstrum 
parameters or corresponding differential cepstrum parameters. 

5 If a discrete HMM or HMMs are utilised, the codebook, may have been determined by an 
off-line training procedure which utilised real-life sound source recordings. The number of 
feature vectors that constitutes the codebook may vary depending on the particular 
application, but for hearing aid applications, it has been found that a codebook comprising 
between 8 and 256 different feature vectors, such as 32 - 64 different feature vectors 

1 0 usually will provide an adequate coverage of the complete feature space. The comparison 
between each of the feature vectors computed from the consecutive signal frames and 
the codebook provides a symbol value which may be selected by choosing an integer 
index belonging to that codebook entry nearest to the feature vector in question. Thus, the 
output of this vector quantization process may be a sequence of integer indexes 

15 representing the corresponding symbol values. 

To generate the codebook so as to closely resemble feature vectors that is generated in 
the hearing prosthesis during on-line processing of the input signal, i.e. normal use, the 
real life sound recordings may have been made by passing the signal through an input 

20 signal path of a target hearing prosthesis. By adopting such a procedure, frequency 
response deviations as well as other linear and/or non-linear distortions generated by the 
input signal path of the target hearing prosthesis can be compensated by introducing 
corresponding signal characteristics into the codebook. Thus, a close resemblance 
between the feature vector set and on-line generated feature vectors is secured to 

25 optimise recognition and classification results from the subsequent processing in the 
discrete Hidden Markov Model or Models. A similar advantageous effect may, naturally, 
be obtained by performing a pre-processing of the real-life sound recordings which is 
substantially similar to the processing of the input signal path of a target hearing 
prosthesis before extraction of the feature vector set or codebook is performed. The latter 

30 solution could be implemented by applying suitable analogue and/or digital filters or filter 
algorithms to the input signal tailored to simulate a priori known characteristics of the input 
signal path in question. 

While it has proven helpful to utilise so-called left-to-right Hidden Markov Models in the 
35 field of speech recognition where the known temporal characteristics of words and 
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utterances are matched in a structure of the model, the present inventors have found it 
advantageous to use at least one ergodic .Hidden Markov Model, and, preferably, to use 
ergodic Hidden Markov Models for all applied Hidden Markov Models. An ergodic Hidden 
Markov Model is a model in which it is possible to reach any internal state from any other 
5 internal state in the model. 

The number of internal model states of any particular HMM of the plurality of HMMs may 
depend on the particular type of predetermined sound source modelled. A relatively 
simple nearly constant noise source may be adequately modelled by a HMM with only a 

10 few internal states while more complex sound sources such as speech or mixed speech 
and complex noise sources may require additional internal states. Preferably, the at least 
one Hidden Markov Model or each of the plurality of Hidden Markov Models comprises 
between 2 and 1 0 states, such as between 3 and 8 states. According to a preferred 
embodiment of the invention, four discrete HMMs are used in a proprietary DSP in a 

1 5 hearing instrument, where each of the four HMMs has 4 internal states. The four internal 
states are associated with four common predetermined sound sources: speech source, 
traffic noise source, multi-talker or babble source, and subway noise source, respectively. 
A codebook with 64 feature vectors, each consisting of 12 delta-cepstrum parameters, is 
utilised to provide vector quantisation of the feature vectors derived from the input signal 

20 of the hearing aid. However, the feature vector set may comprise between 8 and 256 
different feature vectors, such as 32 - 64 different feature vectors without taking up 
excessive amount of memory in the hearing aid DSP. 

The processing means may be adapted to process the input signal in accordance with at 
25 least two different predetermined signal processing algorithms, each being associated 
with a set of algorithm parameters, where the processing means are further adapted to 
control a transition between the at least twp predetermined signal processing algorithms 
in dependence of the element value(s) of the classification vector. This embodiment of the 
invention is particularly useful where the hearing prosthesis is equipped with two closely 
30 spaced microphones, such as a pair of omni-directional microphones, generating a pair of 
input signals which can be utilised to provide a directional signal mode by well-known 
delay-subtract techniques and a non-directional signal mode, e.g. by processing only one 
of the input signals. The processing means may control a transition between the 
directional and the omni-directional mode in a smooth manner through a range of 
35 intermediate values of the algorithm parameters so that the directionality of the processed 



WO 01/76321 



PCT/DK01/00226 



12 

output signal gradually increases/decreases. The user will thus not experience abrupt 
changes In the reproduced sound but rather e.g. a smooth improvement in signal to noise 
ratio. 

To control such transitions between two predetermined signal processing algorithms, the 
processing means may further comprise a decision controller adapted to monitor the 
elements of the classification vector and control transitions between the plurality of Hidden 
Markov Models in accordance with a predetermined set of rules. The decision controller 
may advantageously operate as an intermediate layer between the classification vector 
provided by the HMMs and the one or plurality of related algorithm parameters. By 
monitoring element values of the classification vector and controlling the value(s) of the 
related algorithm parameter(s) in accordance with rules about maximum and minimum 
switching times between HMMs and, optionally, interpolation characteristics between the 
algorithm parameters, the inherent time scales that the HMMs operates on can be 
smoothed. If for example, a number of discrete HMMs operates on consecutive symbol 
values that each represent a time frame of about 6 ms, it may be advantageous to 
lowpass filter or smooth rapid transitions between a speech HMM and babble noise HMM 
that are caused by pauses between words in conversational speech in a "cocktail party" 
type listening environment Instead of performing an instantaneous switch between the 
two predetermined signal processing algorithms for every model transition, suitable time 
constants and hysteresis could be provided in the decision controller. 

According to a preferred embodiment of the invention, the decision controller comprises a 
second set of HMMs operating on a substantially longer time scale of the input signal than 
25 the HMM(s) in a first layer. Thereby, the processing means are adapted to process the 
observation sequence of symbol values or the feature vectors with a first set of Hidden 
Markov Models operating at a first time scale and associated with a first set of 
predetermined sound sources to determine element values of a first classification vector. 
Subsequently, the first classification vector is processed with the second set of Hidden 
30 Markov Models operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
vector. 

The first time scale is preferably selected within the range 10 - 100 ms to allow the first 
35 set of HMMs to operate on individual signal features of common speech and noise signals 
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and the second time scale is preferably selected within the range 1-60 seconds such as 
about 10 or 20 seconds to allow the second set of HMMs to operate on changes between 
different listening environments. Environmental changes usually occur when the user of 
the hearing prosthesis moves between differing listening environments, e.g. a subway 
5 station and the interior of a train or a domestic environment, or between' an interior of a 
car and standing near a street with bypassing traffic etc. 

A second aspect of the invention relates to a method of generating automatic 
classification of input signals in a hearing prosthesis, the method comprising the steps of: 

10 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
prosthesis to generate an input signal, 

processing the input signal in accordance with a predetermined signal processing 
15 algorithm and a plurality of related algorithm parameters stored in a memory area to 
generate a processed output signal, 

segmenting the input signal into consecutive signal frames of time duration, , 

20 generating respective feature vectors, 0(t), representing predetermined signal features 
of the consecutive signal frames, 

processing the feature vectors with at least one Hidden Markov Model, 
jjsource = \a 5 ^ ,b{oif\c%™}, associated with a predetermined sound source to 
25 determine element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment, 

controlling one or several values of the related algorithm parameters in dependence of 
element value(s) of the classification vector to control characteristics of the processed 
30 output signal, 

converting the processed output signal into an electrical or an acoustic output signal or 
signals by one or several output transducers, 
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thereby adapting characteristics of the predetermined signal processing algorithm to the 
current listening environment; wherein 

A* ource = A state transition probability matrix; 
5 b(0(t))= Probability function for the observation 0(t) for each state of the at least one 
Hidden Markov Model; 

a source = An jnjtja | $tate probability distribution vector. 

The feature vectors may be subjected to a vector quantisation process by comparing each 
10 of the respective feature vectors, 0{t) , with a feature vector set or codebook, and 
determine, for substantially each feature vector, an associated symbol value so as to 
generate an observation sequence of symbol values associated with the consecutive 
signal frames. By processing the observation sequence of symbol values with at least one 
discrete Hidden Markov Model, X™™ = {a^.B^.o^}, associated with the 

15 predetermined sound source, the element value or values of the classification vector may 
be determined; wherein 

^source = ^ 0 b serva tj on symbol probability distribution matrix. 

20 For hearing aid applications, it has been found useful to utilise at least a few HMMs in 
order to recognise at least a few corresponding and common listening environments so 
that the method may comprise processing the feature vectors with a plurality of Hidden 
Markov Models, or process the observation sequence of symbol values vectors with a 
plurality of discrete Hidden Markov Models. According to this embodiment of the 

25 invention, each of the discrete Hidden Markov Models or the Hidden Markov Models is 
associated with a respective predetermined sound source to determine the element 
values of the classification vector, each element value indicating a probability of the 
respective predetermined sound source being active in the current listening environment. 

30 According to a third aspect of the invention, a set of HMMs are utilised to recognise 
respective isolated words to provide the hearing prosthises with a capability of identifying 
a small set of voice commands which the user may utilise to control one or several 
functions of the hearing aid by his/hers voice. For this word recognition feature, discrete 
left-right HMMs are preferably utilised rather than the ergodic HMMs that it was preferred 
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to applly to the task of providing automatic listening enviroment classification. Since a left- 
right HMM is a special case of an ergodic HMM, the HMM structure that is used for the 
above-described ergodic HMMs may be at least partly re-used for the left-right HMMs. 
This has the advantage that DSP memory and other hardware resources may be shared 
5 in a hearing prosthesis that provides both automatic listening enviroment classification 
and word, recognition. Preferably, a number of isolated word HMMs, such as 2 - 8 HMMs, 
is stored in the hearing prosthesis to allow the processing means to recognise a 
corresponding number of distinct words. The output from each of the isolated word HMMs 
is a probability for a modelled word being spoken. Each of the isolated word HMMs must 

10 be trained on the particular word or command it must recognise during on-line processing 
of the input signal. The training could be performed by applying a concatenated sound 
source recording including the particular word or command spoken by a number of 
different individuals to the associated HMM. Alternatively, the training of the isolated word 
HMMs could be performed during a fitting session where the words or commands 

15 modelled were spoken by the user himself to provide a personalised recognition function 
in the user's hearing prosthesis. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 A preferred embodiment of a software programmable DSP based hearing aid according to 
the invention is described in the following with reference to the drawings, wherein 

Fig. 1 is a simplified block diagram of three-chip DSP based hearing aid utilising Hidden 
Markov Models for input signal classification according to the invention, 

25 

Fig. 2 is a signal flow diagram of a predetermined signal processing algorithm executed 
on the three-chip DSP based hearing aid shown in Fig. 1 , 

Fig. 3 is signal flow diagram illustrating a listening environment classification process, 

30 

Fig. 4 is a state diagram for the environment Hidden Markov Model shown in Fig. 3 as 
block 550. 



35 
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DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT 

In the following, a specific embodiment of a three chip-set DSP based hearing aid 
according to the invention is described and discussed in greater detail. The present 
5 description discusses in detail only an operation of the signal processing part of a DSP- 
core or kernel with associated memory circuits. An overall circuit topology that may form 
basis of the DSP hearing aid is well known to the skilled person and is, accordingly, 
reviewed in very general terms only. 

10 In the simplified block diagram of Fig. 1 , a conventional hearing aid microphone 105 
receives an acoustic signal from a surrounding listening environment. The microphone 
105 provides an analogue input signal on terminal MIC1IN of a proprietary A/D integrated 
circuit 102. The analogue input signal is amplified in a microphone preamplifier 106 and 
applied to an input of a first A/D converter of a dual A/D converter circuit 1 10 comprising 

1 5 two synchronously operating converters of the sigma-delta type. A serial digital data 
stream or signal is generated in a serial interface circuit 111 and transmitted from terminal 
A/DDAT of the proprietary A/D integrated circuit 102 to a proprietary Digital Signal 
Processor circuit 2 (DSP circuit). The DSP circuit 2 comprises an A/D decimator 13 which 
is adapted to receive the serial digital data stream and convert it into corresponding 16 bit 

20 audio samples at a lower sampling rate for further processing in a DSP core 5. The DSP 
core 5 has an associated program Random Read Memory (program RAM) 6, data RAM 7 
and Read Only Memory (ROM) 8. The signal processing of the DSP core 5, which is 
described below with reference to the signal flow diagram in Fig. 2 is controlled by 
program instructions read from the program RAM 6. 

25 

A serial bi-directional 2-wire programming interface 300 allows a host programming 
system (not shown) to communicate with the DSP circuit 2, over a serial interface circuit 
12, and a commercially available EEPROM 202 to perform up/downloading of signal 
processing algorithms and/or associated algorithm parameter values. 

3d 

A digital output signal generated by the DSP-core 5 from the analogue input signal is 
transmitted to a Pulse Width Modulator circuit 14 that converts received output samples to 
a pulse width modulated (PWM) and noise-shaped processed output signal. The 
processed output signal is applied to two terminals of hearing aid receiver 10 which, by its 
35 inherent low-pass filter characteristic converts the processed output signal to an 



WO 01/76321 



PCT/DK01/00226 



17 

corresponding acoustic audio signal. An internal clock generator and amplifier 20 receives 
a master clock signal from an LC oscillator tank circuit formed by L1 and C5 that in co- 
operation with an internal master clock circuit 1 12 of the A/D circuit 102 forms a master 
clock for both the DSP circuit and the A/D circuit 102. The DSP-core 5 may be directly 
5 clocked by the master clock signal or from a divided clock signal. The DSP-core 5 is 
preferably clocked with a frequency of about 2 - 4 MHz. 

Fig. 2 illustrates a relatively simple application of discrete Hidden Markov Models to 
control algorithm parameter values of a predetermined signal processing algorithm of the 

10 DSP based hearing aid shown in Fig. 1. The discrete Hidden Markov Models are used in 
the hearing aid or instrument to provide automatic classification of three different listening 
environments, speech in traffic noise, speech in babble noise, and clean speech as 
illustrated in Fig. 4. In the present embodiment of the invention, each listening 
environment is connected with a particular pre-set frequency response implemented by 

1 5 FIR-filter block 450 that receives its filter parameter values from a filter choice controller 
430. Operations of both the FIR-filter block 450 and the filter choice controller 430 are 
preferably performed by respective sub-routines executed on the DSP core 5. Switching 
between different FIR-filter parameter values is automatically performed when the user of 
the hearing aid is moving between different listening environments which is detected by 

20 an listening environmental classification algorithm 420, comprising two sets of discrete 
HMMs operating at differing time scales as will be explained with reference to Figs. 3 and 
4. Another possibility is to let the listening environmental classifier 420 supplement an 
additional multi-channel AGC algorithm or system, which could be inserted between the 
input (IN) and the FIR-filter block 450, calculating, or determining by table lookup, gain 

25 values for consecutive signal frames of the input signal. 

The user may have a favorite frequency response/gain for each of the listening 
environments that can be recognized/classified by its corresponding discrete Hidden 
Markov Model. These favorite frequency responses/gains may be found by applying a 
30 number of standard prescription methods, such as NAL, POGO etc, combined with 
individual interactive fine-tuning methods. 

In Fig. 2, a raw input signal at node IN, provided by the output of the A/D decimator 13 in 
Fig. 1, is segmented to form consecutive signal frames, each with a duration of 6 ms. The 
input signal is preferably sampled at 16 kHz at this node so that each frame consists of 96 
35 audio signal samples. The signal processing is performed along of two different paths, in 
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a classification path through signal blocks 410, 420, 440 and 430, and a predetermined 
signal processing path through block 450. Pre-computed impulse responses of the 
respective FIR filters are stored in the data RAM during program execution. The choice of 
parameter values or coefficients for the FIR filter block 450 is performed by the Filter 
5 Choice Block 430 based on the element values of the classification vector, and, optionally, 
on data from the Spectrum Estimation Block 440. 

Fig. 3 shows a signal flow diagram of a preferred implementation of the classification 
block 420 of Fig. 2. A vector quantizer (VQ) block 510 precedes the dual layer HMM 

10 architecture, where blocks 520, 521 , 522 is a first HMM layer and block 550 is a second 
HMM layer. The system therefore consists of four stages: a feature extraction layer 500, a 
sound feature classification layer 510, the first HMM layer in the form of a sound source 
classification layer 520-522 and a second HMM layer in the form of a listening 
environment classification layer 550. The sound source classification layer uses three or 

15 five Hidden Markov Models and a single HMM is used in the listening environment 
classification layer 550. 

The structure of the classification block 420 makes it possible to have different switching 
times between different listening environments, e.g. slow switching between traffic and 
20 babbie and fast switching between traffic and speech. 

The output signal OUT1 of classification block 420 is a classification vector, in which each 
element contains the probability that a particular sound source of the three pre- 
determined sound sources 520, 521, 522 modelled by their respective discrete HMMs is 
25 active. The output signal OUT2 is another classification vector, in which each element 
contains the probability that a particular listening environment is active. 

The processing of the input signal in the above-mentioned classification path is described 
in the following with reference to the Implementation in Fig. 3: 

30 

The input at time t is a block x(i), of size S, with input signal samples. 

*(<)=[*,(') - xM 

x(t) is multiplied with a window, w n , and the Discrete Fourier Transform, DFT, is 
calculated. 
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1 B-\ f 2zkn 

A feature vector is extracted or computed for every new frame. It is presently preferred to 
use 12 cepstrum parameters for each feature vector: 

c t (0=Z , cos[^)log|^W * = 0..11 
5 The output at time t is a feature column vector, f (/), with continuous valued elements. 

f»-M0 *,(') - A,«T 

The corresponding differential cepstrum parameter vector (often called delta-cepstrum), is 
calculated as M(0 = ]£A,f(jf--i) f where fyis determined such that Af(t) approximates 

10 the first differential of f(t) with respect to the time /. A preferred length of the filter 
defined by coefficients h. is K=8. 

The delta-cepstrum coefficients are sent to the vector quantizer in the classification block 
420. Other features, e.g. time domain features or other frequency-based features, may be 
15 added. 

The classification block 420 comprises three layers operating at different time scales: (1) 
a Short-term Layer (Sound Feature Classification) 510, operating instantly on each signal 
frame, (2) a Medium-term Layer (Sound Source Classification) 501-522, operating in the 
20 time-scale of envelope modulations within predetermined sound sources modelled by the 
four HMMs, and (3) a Long-term Layer (Listening Environment Classification) 550, 
operating in a slower time-scale corresponding to shifts between different sound sources 
in a given listening environment or the shift between different listening environments. This 
is further illustrated in Fig. 4. 

25 

The predetermined sound sources modelled by the present embodiment of the invention 
are traffic noise source, babble noise source, and a clean speech source but could also 
comprise mixed sound sources that each may contain a predetermined proportion of e.g. 
speech and babble or speech and traffic noise as illustrated in Fig. 4. The final output of 
30 the classifier is a listening environment probability vector, OUT1 , continuously indicating a 
current probability estimate for each listening environment, and a sound source probability 
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vector, OUT2, indicating the estimated probability for each sound source. A listening 
environment may consist of one of the predetermined sound sources 520-522 or a 
combination of two or more of the predetermined sound sources as illustrated in more 
detail in the description of Fig. 4. 

5 

The input to the vector quantizer block 510 is a feature vector with continuously valued 
elements. The vector quantizer has M, e.g 32, codewords in the codebook [c l ... c^] 
approximating the complete feature space. The feature vector is quantized to closest 
codeword in the codebook and the index o(t), an integer index between 1 and M t to the 
10 closest codeword is generated as output. 



The VQ is trained off-line with the Generalized Lloyd algorithm (Linde, 1980). Training 
material consisted of real-life recordings of sounds-source samples. These recordings 
have been made through the input signal path, shown on Fig. 1 , of the DSP based 
15 hearing instrument. 

Each of the three sound sources is modelled by a respective discrete HMM. Each HMM 
consists of a state transition probability matrix, A source , an observation symbol probability 
distribution matrix, B source , and an initial state probability distribution column vector, 
20 eg"*. A compact notation for a HMM is, XT* = {a sou ™ ,B swrce ,a s Q ource \ Each sound 
source model has A/=4 internal states and observes the stream of VQ symbol values or 
centroid indices [o(l) — 0(t)) O t e The current state at time t is modelled 

as a stochastic variable Q source (t)e 

25 The purpose of the medium-term layer is to estimate how well each source model can 
explain the current input observation 0(t). The output is a column vector u(f) with 
elements indicating the conditional probabilities 



30 The standard forward algorithm (Rabiner, 1989) is used to update recursively the state 
probability column vector p* 01 ^). The elements pf'ty) of this vector indicate the 
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conditional probability that the sound source is in state /, 
P r™(t) - probfc°™{t) = i,o{f\{t - \\..., 2°™). 

The recursive update equations are: 
5 p i ^(r)=((A ,0 ^) r p~ Bre '(r-l))ob 10Bree (o(r)) 

(0 = P rob(o{tH - \\...M^ a )- X>r~(') 

M 

* source ( A source ( A /Xp ^source (A 

wherein operator ©defines element-wise multiplication. 

10 

Fig. 4 shows in more detail a slightly modified version of dual layer HMM structure 
illustrated in Fig. 3 so that the first layer of HMMs 520t522 comprises two additional 
HMMs, a fourth HMM modelling a predetermined sound source of "speech in traffic noise" 
and fifth HMM modelling a predetermined sound source "speech in cafeteria babble 11 . 

15 

Signal OUT1 of the final HMM layer 550 estimates current probabilities for each of the 
modelled listening environment by observing the stream of sound source probability 
vectors from the previous layer of HMMs. The listening environment is represented by a 
discrete stochastic variable E(t)e {1...3}, with outcomes coded as 1 for "speech in traffic 

20 noise", 2 for "speech in cafeteria babble", 3 for "clean speech". Thus, the output 
probability vector or classification vector has three elements, one for each of these 
environments. The final HMM layer 550 contains five states representing Traffic noise, 
Speech (in traffic, "Speech/T"), Babble, Speech (in babble, "Speech/B"), and Clean 
Speech ( a Speech/C). Transitions between listening environments, indicated by dashed 

25 arrows, have low probability, and transitions between states within one listening 
environment, shown by solid arrows, have relatively high probabilities. 

The final HMM layer 550 consists of a Hidden Markov Model with five states and transition 
probability matrix A" 1 " (Fig. 4). The current state in the environment hidden Markov 
30 model Is modelled as a discrete stochastic variable S(t) e {l ...5} , with outcomes coded as 
1 for "traffic", 2 for speech (in traffic noise, "speecfi/T), 3 for "babble", 4 for speech (in 
babble, "speech/ET), and 5 for clean speech *speech/CT. 
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The speech in traffic noise listening environment, E(t)= 1 , has two states s(t)= 1 and 
S(t)=2 . The speech in cafeteria babble listening situation, E(t)= 2 , has two states 
S(t)= 3 and S(t)= 4 . The clean speech listening environment, E(t)= 3 f has only one 
5 state, S(t) = 5 . The transition probabilities between listening environments are relatively 
low and the transition probabilities between states within a listening environment are high. 

The environment Hidden Markov Model 550 observes the stream of vectors 
[u(l) ... u(/)], where 

10 u(t)^\f^(t) ^ eech (t) ^ bble {f) ^(Of containing the estimated 

observation probabilities for each state. The probability for being in a state given the 
current and all previous observations and given the environment Hidden Markov Model, 

p? v = prob(s(t) = i]u(t\...,u(l\ A**"), is calculated with the forward algorithm (Rabiner, 
1989), 

15 p™(t)=([A™yp™(t-l))ou{t) t with elements 

p™ = prob(s(t) = i, u(/)|u(/ - 1),..., u(l), A 01 "), and finally, with normalization, 

p-(0=p CTv (0/SA cnv (0- 

The probability for each listening environment, p E (t) , given all previous observations and 
given the environment hidden Markov model, can now be calculated as 
'1 1 0 0 0^ 



20 p*(/) = 



0 0 110 
0 0 0 0 1 



As previously mentioned, the spectrum estimation block 440 of Fig. 2 is optional but may 
be utilized to estimate an average frequency spectrum which adapts slowly to the current 
listening environment. Another possibility is to estimate two or more slowly adapting 
25 spectra for different sound sources in a given listening environment, e.g. one speech 
spectrum and one noise spectrum. 
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The source probabilities, f wm (t), the environment probabilities p £ (*). and the current 
log power spectrum, X(t), are used to estimate the current signal and noise log power 
spectra. Two low-pass filters are used in the estimation, one filter for the signal spectrum 
and one filter for the noise spectrum. The signal spectrum is updated if pf(t)> pi (t) and 
5 f*~*(t)>f*<(t) or If p*(t)> pf(t) and ^(t)>^(t). The noise spectrum is 
updated if p?(t)>p!{t) and jr*°b)>f*~*(t) or if pi(t)>p*(t) and 
^ M ' e (r)>#" eec *(r). 

NOTATION: 

10 M Number of centroids in Vector Quantizer 
N Number of States in HMM 

^urce = ^rce ^rce ^^ourcej compact notation for a discrete HMM, describing a source, 
with N states and M observation symbols 
B Blocksize 
15 O = [0_ 0 ••■ O t ] Observation sequence 

O t e Discrete observation at time t 

f (/) Feature vector 
w Window of size B 

x(f) One block of size B, at time t, of raw input samples 
20 X(t) The corresponding discrete complex spectrum, of size B, at time t 
REFERENCES 

L R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech 
Recognition. Proc. IEEE, vol. 77, no. 2, February 1989 

25 Llnde, Y., Buzo, A., and Gray, R. M. An Algorithm for Vector Quantizer Design. IEEE 
Trans. Comm., COM-28:84-95, January 1980. 
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CLAIMS 

1 . A hearing prosthesis comprising: 

5 a microphone adapted to generate an input signal in response to receiving an acoustic 
signal from a listening environment, 

an output transducer for converting a processed output signal into an electrical or an 
acoustic output signal, 

10 

processing means adapted to process the input signal in accordance with a 
predetermined signal processing algorithm and related algorithm parameters to generate 
the processed output signal, 

15 a memory area storing values of the related algorithm parameters for the predetermined 
signal processing algorithm, 

the processing means being further adapted to: 

20 segment the input signal into consecutive signal frames of time duration, , and 

generate respective feature vectors, 0(t) t representing predetermined signal features of 
the consecutive signal frames, 

process the feature vectors with at least one Hidden Markov Model, 

25 £ ource = \A S0 ™,b(o(t)lc4 ource }, associated with a predetermined sound source to 

determine an element value or values of a classification vector indicating a probability of 
the predetermined sound source being active in the listening environment, 

control one or several values of the related algorithm parameters in dependence of 
30 element value(s) of the classification vector, 

thereby adapting characteristics of the predetermined signal processing algorithm to the 
current listening environment; wherein: 



WO 01/76321 



! 

PCT/DK01/00226 



25 

A *ource = A state transition probability matrix; 

6 (0(O) = Probability function for an input observation 0(t) for each state of the at least 

one Hidden Markov Model; 

source = An j n j tja j gtate p ro | ) abillty distribution vector. 

5 

2. A hearing prosthesis according to claim 1, wherein the processing means are adapted 
to: 

compare each of the feature vectors, 0{t) , with a feature vector set to determine, for 
10 substantially each feature vector, an associated symbol value so as to generate an 
observation sequence of symbol values associated with the consecutive signal frames, 

process the observation sequence of symbol values with at least one discrete Hidden 
Markov Model, XT" = {A sou ~%B SOUKe ,a s 0 ouree } 9 associated with the predetermined sound 
15 source to determine the element value(s) of the classification vector; wherein: 
gsourx* _ ^ 0 b Servat j 0n symbol probability distribution matrix. 

3. A hearing prosthesis according to claim 1 or 2, wherein the processing means are 
adapted to process the feature vectors with a plurality of Hidden Markov Models, or 

20 process the observation sequence of symbol values with a plurality of discrete Hidden 
Markov Models, 

each of the discrete Hidden Markov Models or the Hidden Markov Models being 
associated with respective predetermined sound sources to determine the element values 
25 of the classification vector, each element value indicating a probability of a respective 
predetermined sound source being active in the listening environment. 

4. A hearing prosthesis according to any of the preceding claims, wherein the value of 
T frame lies between 1 to 100 milliseconds, such as about 5 - 10 milliseconds. 

30 

5. A hearing prosthesis according to claim 3, wherein at least some of the plurality of 
Hidden Markov Models are adapted to model respective predetermined sound sources 
selected from the group consisting of: {speech, telephone speech, traffic noise, multi- 
talker or babble noise, subway noise, transient noise, wind noise}. 
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6. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of frequency-domain parameters 
representing the predetermined signal features of the consecutive signal frames. 

5 

7. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of time-domain parameters representing 
the predetermined signal features of the consecutive signal frames. 

10 8. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of cepstrum parameters or differential 
cepstrum parameters representing the predetermined signal features of the consecutive 
signal frames. 

15 9. A hearing prosthesis according to any of claims 2- 8, wherein the feature vector set has 
been determined in an off-line training procedure which utilised real-life sound source 
recordings and stored in non-volatile memory locations of the hearing instrument. 

10. A hearing prosthesis according to any of claim 9, wherein the real-life sound 

20 recordings have been applied to an input signal path of a target hearing prosthesis or by 
performing an equivalent signal processing of the input signal to simulate characteristics 
of the input signal path. 

1 1 . A hearing prosthesis according to any of claims 2-10, wherein the each of the feature 
25 vectors is associated with respective integer symbol values during a vector quantisation 

process. 

12. A hearing prosthesis according to any of the preceding claims, wherein the Hidden 
Markov Model or Models comprise at least one ergodic Hidden Markov Model. 

30 

13. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Markov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 2 and 10 states. 
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14. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Markov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 8 and 256 discrete symbols. 

5 15. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means are adapted to process the input signal in accordance at least two different 
predetermined signal processing algorithms, each being associated with a respective set 
of algorithm parameters, 

10 the processing means being further adapted to control a switching between the at least 
two predetermined signal processing algorithms in dependence of the determined 
element value(s) of the classification vector. 

16. A hearing prosthesis according to claim 15, wherein the processing means are 

1 5 adapted to process two input signals from a pair of omni-directional microphones by a first 
predetermined signal processing algorithm with a first set of algorithm parameters, and 

adapted to process the two input signals by a second predetermined signal processing 
algorithm with a second set of algorithm parameters. 

20 

17. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
further comprises a decision controller adapted to monitor the elements of the 
classification vector and control transitions between the plurality of Hidden Markov Models 
in accordance with a predetermined set of rules. 

25 

18. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
are adapted to process the observation sequence of symbol values or the feature vectors 
with a first set of Hidden Markov Models operating at a first time scale and associated with 
a first set of predetermined sound sources to determine element values of a first 

30 classification vector, and 

adapted to process the first classification vector with a second set of Hidden Markov 
Models operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
35 vector. 
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19. A hearing prosthesis according to claim 18, wherein the first time scale is selected 
within the range 10 - 100 ms and the second time scale is selected within the range 1 - 
60 seconds. 

5 

20. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means comprises a software programmable processor. 

21. A method of generating automatic classification of input signals in a hearing 
1 0 prosthesis, the method comprising the steps of: 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
prosthesis to generate an input signal, 

1 5 processing the input signal in accordance with a predetermined signal processing 
algorithm and related algorithm parameters to generate a processed output signal, 

segmenting the input signal into consecutive signal frames of time duration, , 

20 generating respective feature vectors, 0{t), representing predetermined signal features 
of the consecutive signal frames, 

processing the feature vectors with at least one Hidden Markov Model, 
gaum = {A"™\b{0(t)\a s Q 0urce } t associated with a predetermined sound source to 
25 determine element value(s) of a classification vector indicating a probability of the 
predetermined sound source being active in the listening environment, 

controlling one or several values of the related algorithm parameters in dependence of 
element value(s) of the classification vector to control characteristics of the processed 
30 output signal, 

converting the processed output signal into an electrical or an acoustic output signal or 
signals by one or several output transducers, 
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thereby adapting characteristics of the main processing algorithm to the listening 
environment, 

A source = A $tate trans jtj on probability matrix; 
5 b{0(t)) = Probability function for the observation 0(t) for each state of the at least one 
Hidden Markov Model; 

source _ An injt j a | stgte p ro b a 5j|jty distribution vector. 

22. A method according to claim 21, comprising the steps of: 

10 

comparing each of the respective feature vectors, 0(t) , with a feature vector set , 

determining, for substantially each feature vector, an associated symbol value so as to 
generate an observation sequence of symbol values associated with the consecutive 
15 signal frames, 

processing the observation sequence of symbol values with at least one discrete Hidden 
Markov Model, T urce = {a sou ™ ,B 50urce ,0*°™), associated with the predetermined sound 
source to determine the element value or values of the classification vector, 

20 

B* wrce = An observation symbol probability distribution matrix. 

23. A method according to claim 21 or 22, wherein the processor is adapted to process 
the feature vectors with a plurality of Hidden Markov Models, or process the observation 

25 sequence of symbol values vectors with a plurality of discrete Hidden Markov Models, 

each of the discrete Hidden Markov Models or the Hidden Markov Models being 
associated with respective predetermined sound sources to determine the element values 
of the classification vector, each element value indicating a probability of the respective 
30 predetermined sound source being active in the listening environment 
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1 PVS 

A HEARING PROSTHESIS WITH AUTOMATIC CLASSIFICATION OF THE LISTENING 
ENVIRONMENT 

FIELD OF THE INVENTION 

5 

The present invention relates to a hearing prosthesis and method providing automatic 
identification or classification of a listening environment by applying one or several 
predetermined Hidden Markov Models to acoustic signals obtained from the listening 
environment. Processing means within the hearing prosthesis may utilise determined 
10 classification results to control parameter values of a main processing algorithm or to 
control a switching between different pre-set listening programs so as to optimally adjust 
the signal processing of the hearing prosthesis to a given listening environment. 



BACKGROUND OF THE INVENTION 



15 



Today's digitally controlled or Digital Signal Processing (DSP) hearing instruments are 
often provided with a number of pre-set listening programs. These pre-set listening 
programs are often included to accommodate comfortable and intelligible reproduced 
sound quality in differing listening environments. Audio signals obtained from these 

20 listening environments may have highly different characteristics, e.g. in terms of average 
and maximum sound pressure levels (SPLs) and/or frequency content. Therefore, for 
DSP based hearing prosthesis, each type of listening environment may require a 
particular setting of algorithm parameters of a signal processing algorithm of the hearing 
prosthesis to ensure that the user is provided with an optimum reproduced signal quality 

25 in all types of listening environments. The algorithm parameters values to be adjusted 
typically include parameters related to comer frequencies or slopes of filter algorithms or 
routines. Also algorithm parameter values controlling e.g. knee-points and compression 
ratios of Automatic Gain Control (AGC) algorithms are often altered from one pre-set 
listening program to another. 



30 



For today's DSP based hearing aids, a number of different listening programs, each 
tailored to a particular listening environment and/or particular user preferences, is typically 
determined during a fitting session in a dispenser's office. Thereafter corresponding 
algorithm parameters are loaded into a non-volatile memory of the hearing prosthesis. 
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The hearing aid user Is subsequently left with the task of manually selecting, typically by 
actuating a push button on the hearing aid or a program button on a remote control, 
between the pre-set listening programs in accordance with the current listening or sound 
environment. Accordingly, when attending and leaving the multitude of sound 
5 environments in his/hers daily whereabouts, the hearing aid user may have to devote his 
attention to the delivered sound quality and continuously search for the best program 
setting in terms of comfortable sound quality and/or the best speech intelligibility. 

It would therefore be highly desirable to provide a hearing prosthesis such as a hearing 
1 0 aid or cochlea implant device that was capable of automatically classifying the current 
listening environment of the user so as to belong to one of a number of typical everyday 
listening environments. Thereafter, classification results could be utilised in the hearing 
prosthesis to adjust the algorithm parameters or to switch between different pre-set 
listening programs to obtain optimum sound quality and/or speech intelligibility for the 
15 individual hearing aid user. 

In the past there have been made attempts to adapt a signal processing of a hearing aid 
to the type of listening environment that the user is situated in. US 5,687,241 discloses a 
multi-channel DSP based hearing instrument that utilises continuous determination or 

20 calculation of one or several percentile value of input signal amplitude distributions to 
discriminate between speech and noise input signals in the listening environment. Gain 
values in the frequency channels are subsequently altered in response to the detected 
levels of speech and noise. However, it is often desirable to discriminate between subtle 
characteristics of the input signal of the hearing aid not just between speech and noise. 

25 As an example, it may be desirable to switch between an omni-directional and a 
directional microphone listening program in dependence of, not just the level of 
background noise, but also on further signal characteristics of this background noise. In 
situations where the user of the hearing prosthesis communicates with another individual 
in the presence of the background noise, it would be beneficial if it was possible to identify 

30 and classify the type of background noise. Omni-directional operation could be selected in 
the event that the noise being traffic noise to allow the user to clearly hear approaching 
traffic independent of direction of arrival. If, on the other hand, the background noise was 
classified as being babble-noise, the directional listening program could be selected to 
allow the user to improve a signal to noise ratio in the communication with the other 

35 individual. 
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Such a detailed characterisation of an input signal from a listening environment may be 
obtained by applying Hidden Markov Models for analysis and classification of the input 
signal. Hidden Markov Models are capable of modelling stochastic input signals in terms 
of both short and long time temporal variations rather than just being restricted to 
5 modelling long term amplitude distribution statistics or average power. Hidden Markov 
Models are well known in the field of speech recognition as a tool for modelling statistical 
properties of stochastic speech signals. The article "A Tutorial on Hidden Markov Models 
and Selected Applications in Speech Recognition", published in Proceedings of the IEEE, 
VOL 77, No.2, February 1989 deals comprehensively with the application of Hidden 
10 Markov Models to problems in speech recognition. 

The present applicant has, however, for the first time applied Hidden Markov Models to 
the task of classifying the listening environment of a hearing prosthesis to provide 
automatic adjustment of the parameters of a main signal processing algorithm executed in 
1 5 processing means of the hearing prosthesis in dependence of these classification results. 

SUMMARY OF THE INVENTION 

One object of the invention is to provide a hearing prosthesis that automatically adjusts 
20 itself to a surrounding listening environment by controlling one or several algorithm 
parameters of a signal processing algorithm to allow a user to obtain intelligible and 
comfortable amplified sound in variety of different listening environments. 

It is another object of the invention provide a hearing prosthesis that continuously and 
25 automatically classifies an input signal as belonging to one of several everyday listening 
environments and indicates the classification results to processing means to allow the 
latter to perform the above-mentioned control of the algorithm parameters. 

DESCRIPTION OF THE INVENTION 

30 

A first aspect of the invention relates to a hearing prosthesis with automatic sound 
classification, the hearing prosthesis comprising: 

a microphone adapted to generate an input signal in response to receiving an acoustic 
35 signal from a listening environment, 
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an output transducer for converting the processed output signal into an electrical or an 
acoustic output signal, 

5 processing means adapted to process the input signal in accordance with a main signal 
processing algorithm and a plurality of related algorithm parameters to generate the 
processed output signal, 

a memory area storing the plurality of related algorithm parameters for the processing 
10 algorithm, the processing means are further adapted to: 

segment the input signal into consecutive signal frames of time duration, T frame , and 
generate respective feature vectors, O t , representing predetermined signal features of the 
consecutive signal frames, 

15 

process the feature vectors with at least one Hidden Markov Model, * s = {A Sl b(O t ), a fi 0 }, 
associated with a predetermined sound source to determine an element value or values of 
a classification vector indicating a probability of the predetermined sound source being 
active in the current listening environment, 

20 

control one or several values of the plurality of related algorithm parameters in 
dependence of a determined element value or values of the classification vector. Thereby 
adapting characteristics of the main processing algorithm to the current listening 
environment. The at least one Hidden Markov Model (HMM) comprising: 

25 • • 

As = A state transition probability matrix; 

b(O t ) = Probability function for the input observation O t for each state of the at least one 
Hidden Markov Model; 

a s 0 = An initial state probability distribution vector. 

30 

The hearing prosthesis may be a hearing instrument or aid such as a Behind The Ear 
(BTE), an In The Ear (ITE) or Completely In the Canal (CIC) hearing aid. The input signal 
generated by the microphone may be an analogue signal or a digital signal in a multi-bit 
format or in single bit format generated by a microphone amplifier/buffer or an integrated 
35 analogue-to-digital converter, respectively. Preferably, the input signal to the processing 
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means is provided as a digital signal so if the microphone signal is in analogue form, it is 
preferably converted into a digital signal by a suitable analogue-to-digital converter (A/D 
converter) included in an integrated circuit of the hearing prosthesis. The input signal may 
be subjected to various signal processing such as amplification and bandwidth limiting 
5 before being applied to the A/D converter and other processing afterwards such as 
decimation before the input signal is applied to the processing means. 

The output transducer that converts the processed output signal into an acoustic or 
electrical signal or signals may be a conventional hearing aid receiver or another sound 
10 pressure transducer producing a perceivable acoustic signal to the user of the hearing 
prosthesis. The output transducer may also comprise a number of electrodes that may be 
operatively connected to the user's auditory nerve or nerves. 

In the present specification and claims the term "main signal processing algorithm" 

15 designates any processing algorithm, executed by the processing means of the hearing 
prosthesis, that generates the processed output signal from the input signal. Accordingly, 
may the "main signal processing algorithm" contain a single processing algorithm only or 
a plurality of smaller separate processing algorithms or sub-algorithms. The term main 
signal processing algorithm may, accordingly, also designate a number of separate 

20 processing algorithms simultaneously executed in the hearing prosthesis. As an example, 
a separate processing algorithm may be constituted by a frequency selective filtering 
algorithm, a single or multi-channel compression algorithm, a feedback cancellation 
algorithm, a speech detecting algorithm, etc. Typically, but not necessarily, a number of 
such separate processing algorithms is included and simultaneously executed in the main 

25 signal processing algorithm of today's DSP based hearing aids. Furthermore, a number of 
different main signal processing algorithms may be stored in the hearing prosthesis in the 
form of respective pre-set listening programs which the user may be able to switch 
between in accordance with his/hers preferences. However, such different pre-set 
listening programs may consist of identical collections of separate processing algorithms 

30 where only algorithm parameter values differ between the pre-set listening programs. 
Alternatively, the pre-set listening programs may consist of different collections of 
separate processing algorithms so that e.g. pre-set program 1 , or equivalently main signal 
processing algorithm, comprises a multi-channel compression algorithm while pre-set 
program 2 includes the same multi-channel compression algorithm and an additional 

35 feedback cancellation algorithm. 
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The main signal processing algorithm will typically have a plurality of related algorithm 
parameters. These algorithm parameters can usually be divided into a number of smaller 
parameters sets, where each such parameter set is related to a particular part of the main 
5 signal processing algorithm or a particular separate processing algorithm as explained 
above. These parameter sets control certain characteristics of their respective separate 
signal processing algorithms such as comer-frequencies and slopes of filters, 
compression thresholds and ratios of compressor algorithms, adaptation rates and probe 
signal characteristics of feedback cancellation algorithms, etc. 



10 



The algorithm parameters are preferably intermediately stored in a volatile data memory 
area of the processor means such as a data RAM area during execution of the main 
signal processing algorithm. Initial values of the algorithm parameters are stored in a non- 
volatile memory area such as an EEPROM/Flash memory area or battery backed-up RAM 
15 memory area so that these algorithm parameters are retained during power supply 
interruptions which may be caused by the user's removal or replacement of a battery or 
other power source from the hearing prosthesis. 

The processing means may comprise one or several processors and its/their associated 
20 memory circuitry. The processor may be constituted by a fixed point or floating point 
Digital Signal Processor (DSP) with a single or dual MAC architecture that performs both 
the calculations required in the main signal processing algorithm as well a number of so- 
called household tasks such as monitoring and reading values of external interface 
signals and programming ports. Alternatively, the processing means may comprise a DSP 
25 that performs number crunching, i.e. multiplication, addition, division, etc. while a 

commercially available, or even proprietary, microprocessor kernel handles the household 
tasks which mostly involve logic operations and decision making. 

The DSP may be a software programmable type executing the main signal processing 
30 algorithm in accordance with instructions stored in an associated program RAM area. A 
data RAM area integrated with the processing means may store initial and intermediate 
values of the related algorithm parameters and other data variables during execution of 
the main signal processing algorithm as well as various other household variables. Such a 
software programmable DSP may be advantageous for some applications due to the 
35 possibility of implementing and testing modifications of the main signal processing 
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algorithm rapidly. Clearly the same considerations are applicable to the household tasks 
or sub-programs that often forms part of a user interface of the hearing prosthesis. 
Alternatively, the processing means may be constituted by a hard-wired DSP core so as 
to execute one or several fixed main signal processing algorithm(s) in accordance with a 
5 fixed set of instructions from an associated logic controller. In this type of hard-wired 
processor architecture, the memory area storing the plurality of related algorithm 
parameters may be provided in the form of a register file, or as a RAM area, if the number 
of algorithm parameters justifies the latter solution. 

10 According to the invention, the processing means are further adapted to segment the 
input signal into consecutive signal frames of duration T frame and generate respective 
feature vectors, O,, representing predetermined signal features of the consecutive signal 
frames. These feature vectors are subsequently processed with at least one Hidden 
Markov Model, X s = {As, b(0,). a 8 0 }, associated with a predetermined sound source to 

15 determine an element value or values of a classification vector. This classification vector 
indicates a probability of the predetermined sound source being active in the current • 
listening environment. By controlling one or several values of the plurality of algorithm 
parameters related to the main signal processing algorithm in dependence of a 
determined element value or values of the classification vector, the processing of the input 
20 signal is adapted to the current listening environment in dependence of these element 
values. The consecutive signal frames may be non-overlapping or segmented into signal 
frames with a predetermined amount of overlap, e.g. overlapping with about 10 % - 50 % 
to avoid sharp discontinuities at boundaries between neighbouring signal frames and/or 
counteract window effects of an applied e.g. Hanning window at the boundaries. While the 
25 above-mentioned frame segmentation of the input signal is required when the processing 
means process the feature vectors with the at least one Hidden Markov Model, the main 
signal processing algorithm may process the input signal on a sample-by-sample basis or 
on a frame-by-frame basis with a frame time equal to or different from T^. 

30 The at least one Hidden Markov Model may comprise at least one discrete Hidden 
Markov Model. = {As. B s . a s 0 }, wherein B s is an observation symbol probability 
distribution matrix which serves as a discrete replacement of the general function, b(O t ), 
defining the probability function for the input observation O, for each state of a Hidden 
Markov Model. In this discrete case, the processing means are preferably adapted to 

35 compare each of the respective feature vectors. 0„ with a plurality of predetermined 
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feature vectors, often denoted "code book", to determine, for substantially each respective 
feature vector, an associated symbol value so as to generate an observation sequence of 
symbol values associated with the consecutive signal frames. This process of determining 
symbol values from the respective feature vectors is sometimes denoted vector 
5 quantization. Thereafter, the observation sequence of symbol values is processed with 
the at least one discrete Hidden Markov Model, X^, which is associated with the 
predetermined sound source to determine the element value or values of the classification 
vector. 

10 According to a preferred embodiment of the invention, the processing means are adapted 
to process the feature vectors with a plurality of Hidden Markov Models, or process the 
observation sequence of symbol values with a plurality of discrete Hidden Markov Models. 
Each of the discrete Hidden Maifcov Models or each of the Hidden Markov Models is 
preferably associated with a respective predetermined sound source to determine the 

5 element values of the classification vector. Each element value may directly represent a 
probability (i.e. a value between 0 and 1) of the associated predetermined sound source 
being active in the current listening environment. 



The duration of the signal frames, T, rame , is preferably selected to be within the range 1 - 
20 100 milliseconds, such as about 5-10 milliseconds. These ranges allow the applied 
Hidden Markov Models or Model to operate on time scales of the Input signal that are 
comparable to individual features, e.g. phonemes, of speech signals and on envelope 
modulations of a number of relevant noise sound sources. 

25 A predetermined sound source may be any natural or synthetic sound source such as a 
speech source, a traffic noise source, multi-talker or babble source, subway noise source, 
transient noise source or a wind noise source. Temporal and spectral characteristics of 
each of these source types may have been obtained through real-life recordings of one or 
several representative sound sources. The temporal and spectral characteristics for each 

30 type of predetermined sound source are preferably obtained by performing real-life 
recording of a number of such representative sound sources and concatenate these 
recordings in a single recording (or sound file). For speech sound sources, the present 
inventors have found that utilising about 10 different speakers, preferably 5 males and 5 
females, will generally provide good classification results in the Hidden Markov Model 

35 associated with the speech source. When the concatenated sound source recording has 
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been formed, feature vectors, preferably identical to those feature vectors that are 
generated by the processor means in the hearing prosthesis, are extracted from the 
concatenated sound source recording to form a training observation sequence for the 
associated continuous or discrete HMM. The duration of the training sequence depends 
5 on the type of sound source, but it has been found that a duration of about 3 - 20 minutes, 
such as about 4-6 minutes Is adequate for many types of sound sources Including 
speech sound sources. Thereafter, for each sound source, the respective HMM is trained 
with the generated training observation sequence, preferably, by the Baum-Welch 
iterative algorithm to obtain values of, As. the state transition probability matrix, values for 
10 B s . the observation symbol probability distribution matrix (for discrete HMM models) and 
values of a s 0 , the initial state probability distribution vector. If the HMM is ergodic, the 
values of the initial state probability distribution vector are determined from the state 
transition probability matrix. 



15 The respective feature vectors that are formed from the consecutive signal frames may 
represent spectral properties of the signal frames, temporal properties of the signal 
frames or a combination of both. The spectral properties may be expressed in the form of 
Discrete Fourier Transform coefficients, Linear Predictive Coding parameters or cepstrum 
parameters. If cepstrum parameters are utilised, it has been found advantageous to 

20 discard the O'th cepstrum coefficient in the further processing, because this coefficient 
contains information about the total sound pressure level of the input signal. Since it is 
often desirable to obtain a level independent recognition and classification of a sound 
source, this total sound pressure level information is irrelevant for the HMMs and may be 
skipped to improve robustness of sound source recognition and save an accompanying 

25 calculation and memory burden in the processor means while processing the HMMs. 

If a discrete HMM or discrete HMMs are utilised, the plurality of predetermined feature 
vectors, or the code book, may have been determined in an off-line training procedure 
which utilised real-life sound source recordings. The number of predetermined feature 

30 vectors in the code book may vary depending on the particular application, but for a 
hearing aid, it has been found that a code book comprising about 8 to 256 different 
feature vectors, such as 32 - 64 different feature vectors provides an adequate coverage 
of the complete feature space. The comparison between each of the feature vectors 
computed from the signal frames and the code book provides a symbol value which may 

55 be selected by choosing an Integer index belonging to that code book vector nearest to 
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the feature vector in question. Thus, the output of this vector quantization process may be 
a sequence of integer indexes representing the corresponding symbol values. 

To generate the code book feature vectors, which were obtained through processing of 
5 real-life sound recordings as explained above, so as to closely resemble the feature 
vectors generated in the hearing prosthesis during on-line processing of the input signal, 
the real life sound recordings may have been recorded through an input signal path of a 
target hearing prosthesis. By adopting such a procedure, frequency response deviations 
as well as other linear or non-linear distortions generated by the input signal path of the 

10 target hearing prosthesis are tracked by introducing corresponding signal characteristics 
in the code book feature vectors. Thus, a close resemblance between the code book 
feature vectors and the on-line generated feature vectors is secured to provide optimum 
recognition and classification results from the subsequent processing in the discrete 
Hidden Markov Model or Models. A similar advantageous effect may, naturally, be 

15 obtained by performing a signal processing of the real-life sound recordings which is 
similar to the processing of the input signal path of the target hearing prosthesis before 
extraction of the code book feature vectors is performed. This could be implemented by 
applying suitable analogue and/or digital filters or filter algorithms to the input signal 
tailored to simulate a priori known characteristics of the input signal path in question. 

20 

While it has proven helpful to utilise so-called left-to-right Hidden Markov Models in the 
field of speech recognition where the known temporal characteristics of words and 
utterances are matched in a structure of the model, the present inventors has found it 
advantageous to use at least one ergodic Hidden Markov Model, and, preferably, to use 
25 ergodic Hidden Markov Models for all applied Hidden Markov Models. An ergodic Hidden 
Markov Model is a model in which it is possible to reach any internal state from any other 
internal state in the model. 

The number of internal model states of any particular HMM of the plurality of HMMs may 
30 depend on the particular type of predetermined sound source modelled. A simple nearly 
constant noise source may be adequately modelled by a HMM with only a few internal 
states while more complex sound sources such as speech or mixed speech and complex 
noise sources may require additional internal states. Preferably, the at least one Hidden 
Markov Model or each of the plurality of Hidden Markov Models comprises between 2 and 
35 10 states, such as between 3 and 8 states. According to a preferred embodiment of the 
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invention, four HMMs are used by a proprietary DSP in a hearing instrument, where each 
of the four' HMMs has 4 internal states. The four internal states are associated with four 
common predetermined sound sources: speech source, traffic noise source, multi-talker 
or babble source, and subway noise source, respectively. A code book with 64 feature 
5 vectors, each consisting of 12 cepstrum parameters, is utilised to provide vector 
quantisation of the feature vectors derived from the input signal of the hearing aid. 
However, the plurality of feature vectors may comprise between 8 and 256 different 
feature vectors, such as 32 - 64 different feature vectors without taking up excessive 
amount of memory in the hearing aid DSP. 

10 

The processing means may be adapted to process the input signal in accordance at least 
two different main signal processing algorithms, each being associated with a respective 
set of algorithm parameters, where the processing means are further adapted to control a 
transition between the at least two main signal processing algorithms in dependence of 

1 5 the determined element value or values of the classification vector. This embodiment of 
the invention is particularly useful if the hearing prosthesis is equipped with two. 
preferably omni-directional microphones generating two corresponding input signals, so 
that both a directional and an omni-directional pre-set listening program is provided by 
respective main signal processing algorithms and algorithm parameter sets. The 

20 transition between the directional and the omni-directional pre-set listening programs, or 
main signal processing algorithms, and vice versa is preferably performed in a smooth 
manner through a range of intermediate steps, where the directionality of the processed 
output signal is gradually increased/decreased. The user does thus not experience abrupt 
changes in the reproduced sound, but rather e.g. a smooth improvement in signal to noise 

25 ratio. 

To control such a transition between two main signal processing algorithms, the 
processing means may further comprise a decision controller adapted to monitor the 
elements of the classification vector and control transitions between the plurality of Hidden 

30 Markov Models in accordance with a predetermined set of rules. This decision controller 
may operate as an intermediate layer between the classification vector provided by the 
HMMs and the one or plurality of related algorithm parameters. By monitoring element 
values of the classification vector and controlling the value(s) of the related algorithm 
parameters) in accordance with rules about maximum and minimum switching times 

35 between HMMs and. optionally, interpolation characteristics between the algorithm 
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parameters, the inherent time scales that the HMMs operates on can be smoothed. If for 
example, a number of discrete HMMs operates on consecutive symbol values that each 
represent a time frame of about 6 ms. it may be advantageous to lowpass filter or smooth 
rapid transitions between a speech HMM and babble noise HMM that are caused by 
5 pauses between words in conversational speech in a "cocktail party" type listening 
environment. Instead of directly switching between two main signal processing algorithms 
for every model transition, suitable time constants and hysteresis could be implemented in 
the decision controller. 

10 According to a preferred embodiment of the invention, the decision controller comprises a 
second set of HMMs so that the processing means are adapted to process the 
observation sequence of symbol values or the respective feature vectors with a first set of 
Hidden Markov Models, X^, operating at a first time scale and associated with a first set of 
predetermined sound sources to determine element values of a first classification vector. 

15 Thereafter, the first classification vector is processed with the second set of Hidden 

Markov Models, X^, operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
vector. 

20 The first time scale is preferably selected within the range 10 - 100 ms to allow the first 
set of HMMs to operate on individual signal features of common speech and noise signals 
and the second time scale is preferably selected within the range 1 - 60 seconds such as 
about 10 or 20 seconds to allow the second set of HMMs to operate on changes between 
different listening environments. Such environmental changes may happen when the user 

25 of the hearing prosthesis moves between e.g. a subway station and a domestic 

environment, or between an interior of a car and standing near a street with bypassing 
traffic. 

A second aspect of the invention relates to a method of generating automatic 
30 classification of input signals in a hearing prosthesis, the method comprising the steps of: 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
prosthesis to generate an input signal, 
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processing the input signal in accordance with a main signal processing algorithm and a 
plurality of related algorithm parameters stored In a memory area to generate a processed 
output signal, 

5 segmenting the input signal into consecutive signal frames of time duration. T frame , 

generating respective feature vectors, O,, representing predetermined signal features of 
the consecutive signal frames, 

10 processing the feature vectors with at least one Hidden Markov Model, A* = {A s , b(O t ), 
a s o}. associated with a predetermined sound source to determine an element value or 
values of a classification vector indicating a probability of the predetermined sound source 
being active in the current listening environment, 

15 controlling one or several values of the plurality of related algorithm parameters in 

dependence of a determined element value or values of the classification vector to control 
characteristics of the processed output signal, 

converting the processed output signal into an electrical or an acoustic output signal or 
20 signals by one or several output transducers, 

thereby adapting characteristics of the main processing algorithm to the current listening 
environment; wherein 

25 As = A state transition probability matrix; 

b(O t ) = Probability function for the observation 0, for each state of the at least one Hidden 
Markov Model; 

a s 0 = An initial state probability distribution vector, 

30 The respective feature vectors may be subjected to a vector quantisation process by 
comparing each of the respective feature vectors, O,. with a plurality of predetermined 
feature vectors, and 

determining, for substantially each respective feature vector, an associated symbol value 
35 so as to generate an observation sequence of symbol values associated with the 
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consecutive signal frames. By processing the observation sequence of symbol values with 
at least one discrete Hidden Markov Model, X* d = {A s . B s , a s 0 }, associated with the 
predetermined sound source, the element value or values of the classification vector may 
be determined; wherein 

5 

B 8 « An observation symbol probability distribution matrix. 

For hearing aid applications, it has been found useful to utilise at least a few HMMs in 
order to recognise at least a few corresponding and common listening environments so 

10 that the method may comprise processing the feature vectors with a plurality of Hidden 
Markov Models, or process the observation sequence of symbol values vectors with a 
plurality of discrete Hidden Markov Models. According to this embodiment of the 
invention, each of the discrete Hidden Markov Models or the Hidden Markov Models is 
associated with a respective predetermined sound source to determine the element 

15 values of the classification vector, each element value indicating a probability of the 
respective predetermined sound source being active in the current listening environment 

According to a third aspect of the invention, a set of HMMs are utilised to recognise 
respective isolated words to provide the hearing prosthises with a capability of identifying 

20 a small set of voice commands that may be used to control one or several functions of the 
hearing aid. For this word recogintion feature, discrete left-right HMMs are preferably 
utilised rather than the ergodic HMMs that it was preferred to applly to the task of 
providing automatic listening enviroment classification. Since a left-right HMM is a special 
case of an ergodic HMM, the HMM structure that is used for the above-described ergodic 

25 HMMs may be at least partly re-used for the left-right HMMs. This has the advantage that 
DSP memory and other hardware resources may be shared in a hearing prosthesis that 
provides both automatic listening enviroment classification and word recognition. 
Preferably, a number of isolated word HMMs, such as 2- 8 HMMs, is stored in the hearing 
prosthesis to allow the processing means to recognise a corresponding number of distinct 

30 words. The output from each of the isolated word HMMs is a probability for a modelled 
word being spoken. Each of the isolated word HMMs must be trained on the particular 
word or command it must recognise during on-line processing of the input signal. The 
training could be performed by applying a concatenated sound source recording including 
the particular word or command spoken by a number of different individuals to the 

35 associated HMM. Alternatively, the training of the isolated word HMMs could be 
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performed during a fitting session where the words or commands modelled were spoken 
by the user himself to provide a personalised recognition function in the user's hearing 
prosthesis. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

A preferred embodiment of a software programmable DSP based hearing aid according to 
the invention is described in the following with reference to the drawings, wherein 

10 Fig. 1 is a simplified block diagram of three-chip DSP based hearing aid utilising Hidden 
Markov Models for input signal classification according to the invention, 

Fig. 2 is a signal flow diagram of a main signal processing algorithm executed on the 
three-chip DSP based hearing aid shown in Fig. 1, 

15 

Fig. 3 is a signal flow diagram of a dual layer Hidden Markov Model architecture. 
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT 

20 In the following, a specific embodiment of a three chip set DSP based hearing aid 
according to the invention is described and discussed in greater detail. The present 
description discusses in detail only an operation of the signal processing part of a DSP- 
core or kernel with associated memory circuits. An overall circuit topology that may form 
basis of the DSP hearing aid is well known to the skilled person and is, accordingly, 

25 reviewed in very general terms only. 

To support a required low-power and low voltage operation of the present chip set, logic 
gates and circuits are preferably implemented in a low threshold voltage CMOS process. 
Preferred processes are 0.25 - 0.5 ^m CMOS processes with threshold voltages between 
30 about 0.5 and 0.8 Volts. 

In the simplified block diagram of Fig. 1, a conventional hearing aid microphone 105 
receives an acoustic signal from a surrounding listening environment. The microphone 
105 provides an analogue input signal on terminal MIC1IN of a proprietary A/D integrated 
35 circuit 102. The analogue input signal is amplified in a microphone preamplifier 106 and 
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applied to an input of a first A/D converter of a dual A/D converter circuit 110 comprising 
two synchronously operating converters of the sigma-delta type. A serial digital data 
stream or signal is generated in a serial interface circuit 11 1 and transmitted from terminal 
A/DDAT of the proprietary A/D integrated circuit 102 to a proprietary Digital Signal 

5 Processor circuit 2 (DSP circuit). The DSP circuit 2 comprises an A/D decimator 1 3 which 
is adapted to receive the serial digital, data stream and convert it into corresponding 16 bit 
data words at a lower sampling rate for further processing in a DSP core 5. The DSP core 
5 has an associated program Random Read Memory (program RAM) 6, data RAM 7 and 
Read Only Memory (ROM) 8. The signal processing of the DSP core 5, which is 

0 described below with reference to the signal flow diagram in Fig. 2 is controlled by 
program instructions read from the program RAM 6. 



A serial bi-directional 2-wire programming interface 300 allows a host programming 
system (not shown) to communicate with the DSP circuit 2, over an serial interface circuit 
15 12, and a commercially available EEPROM 202 to perform up/downloading of signal 
processing algorithms and/or associated algorithm parameters. 



A digital output signal generated by the DSP-core 5 from the analogue input signal is 
transmitted to a Pulse Width Modulator circuit 14 that converts received output samples to 
20 a pulse width modulated (PWM) and noise-shaped processed output signal. The 

processed output signal is applied to two terminals of hearing aid receiver 10 which, by its 
inherent low-pass filter characteristic converts the processed output signal to an 
corresponding acoustic audio signal. 



25 An internal clock generator and amplifier 20 receives a master clock signal from an LC 
oscillator tank circuit formed by L1 and C5 that in co-operation with an internal master 
clock circuit 1 12 of the A/D circuit 102 forms a master clock for both the DSP circuit and 
the A/D circuit 102. The DSP-core 5 may be directly clocked by the master clock signal < 
from a divided clock signal. The DSP-core 5 is preferably clocked with a frequency of 

30 about2-4MHz. 



Fig. 2 illustrates a simple application of discrete Hidden Markov Models to control 
algorithm parameters of a main signal processing algorithm of the DSP based hearing aid 
shown in Fig. 1. The discrete Hidden Markov Models are used in the hearing aid or 
instrument for automatic classification of four different listening environments. In the 
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present embodiment of the invention, each listening environment is connected with a 
particular pre-set frequency response implemented by FIR-filter block 450 that receives its 
filter parameter values from a decision controller 430 comprising two sets of discrete 
HMMs operating at differing time scales as explained with reference to Fig. 3. Switching 
5 between different filter parameter values is performed when the user of the hearing aid is 
moving between different listening environments which is detected by a classification 
algorithm 420. Another possibility is to let the classifier supplement an additional AGC 
algorithm or system, which could be inserted between the blocking algorithm 400 and the 
FIR-filter block 450, calculating, or determining by table lookup, gain values for the signal 
10 frames delivered by the blocking algorithm 400. 

The user may have a favorite frequency response/gain for each of the listening 
environments that can be classified by the corresponding discrete Hidden Markov Model. 
These favorite frequency responses/gains may be found by applying a number of ^ 
15 standard prescription methods, such as NAL, POGO etc. 

Fig. 3 shows a signal flow diagram of a preferred implementation of the classification 
block 420 and the decision controller 430 of Fig. 2 where they both are included in a dual 
layer HMM algorithm architecture. A vector quantizer (VQ) block 510 precedes the dual 
20 layer HMM architecture. 

The structure of this algorithm makes it possible to have different switching times between 
different listening environments, e.g. slow switching between traffic and babble and fast 
switching between traffic and speech. 

25 

The output signal OUT is a classification vector, in which each element contains the 
probability that a particular sound source of the four predetermined sound sources 520, 
521, 522 modelled by the respective discrete HMMs is active. 

In Fig. 2, the raw input signal at node IN from the output of the A/D decimator 13 in Fig. 1 
30 is segmented to form consecutive signal frames, each with a duration of 6 ms. The input 
signal is sampled at 16 kHz at this point so each frame consists of 96 samples. The signal 
processing in performed along of two different paths, one classification path, through 
signal blocks 400. 410. 420 and 430. and one main signal processing path through blocks 
400 and 450. Pre-computed impulse responses of the respective FIR filters are stored in 
35 the program RAM during program execution. The choice of parameter values or 
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coefficients for the FIR filter block 450 is performed by the decision controller 430 based 
on the element values of the classification vector, and optionally on data from the 
Spectrum Estimation Block. 

5 The processing of the input signal in the above-mentioned classification path is described 
in the following with reference to the classifier 500 implementation illustrated in Fig. 3: 

The input at time t is a block*, , of size B, with input signal samples. 

*,=[*,(<>) *,(!) - x,(B-l)f 
10 x, is multiplied with a window, w, and the Discrete Fourier Transform, DFT, is calculated. 

8 * = 0..5/2-l 

A feature vector is extracted or computed for every new frame. It is presently preferred to 
use 12 cepstrum parameters for each feature vector: 

c,(*)= B %os{^)\o$X ( {n] * = 1..12 
15 The output at time / is a feature column vector, f t , with continuous valued elements. 

»[*,(») c t (2) ... c,(l2)f 

All coefficients except the 0'th coefficient are sent to a first layer of four discrete HMMs 
20 modelling four respective predetermined sound sources, 520 -522. In the present 

embodiment of the invention, the four predetermined sound sources modelled are: {traffic 
noise source, babble noise source, subway noise source, and a speech source}. The 0'th 
cepstrum coefficient contains information about the total sound pressure level and is 
preferably discarded to improve level independent classification of the input signal. Other 
25 features, e.g. time domain features or other frequency-based features, may be added. 

The VQ block 510 of the classifier operates instantly on each of the 6 ms signal frames. 
The first Layer operates in the time-scale of envelope modulations within the four 
predetermined sound sources modelled by the four HMMs. A Long-term Layer 550 
30 operates in a slower time-scale, such as 1 - 60 seconds, corresponding to shifts between 
different predetermined sound sources in a given listening environment or the shift 
between different listening environments. 
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The final output, OUT, of the classifier 500 is a column classification vector, a, , indicating 
the current estimated probabilities of each sound source being active in the listening 
environment. The vector elements are a,(s) = P(Source, = j|0;mode!s). where O is a 
5 sequence of input signal observations provided by VQ block 510. 

The input to the classifier 500 is a feature vector, provided by feature extraction block 410 
of Fig. 2 and having continuous valued elements. The feature vector is quantized with the 
VQ block 510 with e.g. M=64 centroids approximating the complete feature space. 

10 

O, = quantize^), 

where O, is an integer index between 1 and 64 to the nearest centroid in the VQ. 

The VQ is trained off-line with the Generalized Lloyd algorithm (Linde, 1980). The training 
material consists of real-life recordings of sounds-source samples. These recordings have 
15 been made through the input signal path, shown on Fig. 1, of the DSP based hearing 
instrument. 



Each of the four sound sources is modeled by a respective discrete HMM as mentioned 
above. Each discrete HMM comprises a state transition probability matrix, A', an 
20 observation symbol probability distribution matrix, B s , and an initial state probability 
distribution column vector. a' 0 . A compact notation for a HMM is, X = (A',B',a* 0 ). Each 
sound source model has N=4 internal states and observes the stream of VQ symbol 
values or centroid indices O = [o o — O,] O t e[l,A/]. 

25 The required 64 * 12 coefficients of the Vector Quantiser common for all four discrete 
HMMs are preferably stored in the data ROM of the DSP circuit 2 of Fig. 1. Coefficients 
for each state transition probability matrix. A s . and for each observation symbol probability 
distribution matrix, B' , are preferably also stored in the data ROM. 

30 The purpose of the first layer is to estimate how well each predetermined sound source 
model can explain the current input observation O, . The output is a column vector p, with 
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elements indicating the conditional probability 

P,(s)=p(o,\O l _ l ••• O 0 J. .. A? ;A E ^Source, =s). (A E is defined later). 

A slightly modified version of the standard forward algorithm (Rabiner, 1989) is used to 
5 update recursively the state probability column vector % in the present embodiment of 
the invention. The elements a s ,{n) of this vector indicate the conditional probability that aa 
sound source s is in state n, %(n) = p(state| = np; ; A £ ;Source, = s) . 

The recursive update equations are: 

10 

<*: -*H^'k °b(o,)4-*Mk s=\..s 

P,(s)=f,a;(n) s = l..S 

aw 

The operator o defines elementwise multiplication 

The modified forward algorithm includes a scalar weighting factor, £„,(«). which indicates 
the probability that the sound source s was active at the previous frame M. If this 
15 probability is high, the update equation is the standard forward algorithm. If this probability 
is low the state probability distribution is estimated from the initial probability vector 
o£ instead. This weighting factor, dj.,(n), is fed back from the previous final output of the 
Classifier Block. 

20 Listening Environment - Long-term Layer 

The purpose of this second layer, block 550. is to further improve the sound source 
recognition by utilizing knowledge about how fast the listening environments may change. 
The long-term layer consists of one fixed HMM, A* , and one adaptive HMM: 

25 A E e \a ] = A'"^ c , A 2 = A h ° bku ,A 2 = A"™*, A* = A' uh "° y ). 
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The states of these HMMs are the corresponding sound sources. The output from the 
fixed HMM gives a preliminary sound source recognition, E = arg max £ . which is 

used to set the adaptive HMM, A E . The second layer operating in the longer time scale is 
used to further improve the recognition result. E.g. when the preliminary result gives an 
5 indication of traffic noise the adaptive HMM is set to have fast switching between traffic 
noise and speech and very slow switching between traffic noise and babble noise. This 
structure can have both fast and slow switching between the sound sources. 



The fixed and the adaptive forward variables are updated according to the forward 
10 algorithm followed by normalization: 



a*(i)=(A*J«*(0<>P, i = LS 
>(,)«t£0L i = I ..5 

MT4_,(/).p, i=i..s 
2>(«) 



SPECTRUM ESTIMATION 

15 The spectrum estimation block 440 of Fig. 2 is optional but may be utilized to estimate ai 
average frequency spectrum which adapts slowly to the current listening environment. 
Another possibility is to estimate two or more slowly adapting spectra for different sound 
sources in a given listening environment, e.g. one speech spectrum and one noise 
spectrum. 

20 

As mentioned above, the present HMMs are implemented as discrete HMM which are 
more appropriate to use in a DSP based hearing prosthesis such as the present DSP 
based hearing aid since they are easier to implement. There Is, naturally, a trade off 
between performance, i.e. the ability to accurately recognize a plurality of differing 
25 listening environments, on one side and memory and power consumption of the DSP 
processor on the other side. 
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NOTATION- 



M Number of centroids in Vector Quantizer 



N Number of States in HMM 



5 S Number of sound sources 



X = {a\B\x>) compact notation for a discrete HMM, describing 
and M observation symbols 



source s, with N states 



B Blocksize 



O = [0- - o t ] Observation sequence 



10 O i g [l, M] Discrete observation at time t 
f t Feature vector 
w Window of size fi 

x, One block of size B, at time t, of raw input samples 
X t The corresponding discrete complex spectrum, of size B, at time t 
15 A E e[A' r ^ l %A™ hl \A s ^\A sub ™ y \ 
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CLAIMS 

1 . A hearing prosthesis with automatic sound classification, the hearing prosthesis 
comprising: 

5 

a microphone adapted to generate an input signal in response to receiving an acoustic 
signal from a listening environment, 

an output transducer for converting the processed output signal into an electrical or an 
10 acoustic output signal, 

processing means adapted to process the input signal in accordance with a main signal 
processing algorithm and a plurality of related algorithm parameters to generate the 
processed output signal, 

15 

a memory area storing the plurality of related algorithm parameters for the main 
processing algorithm, 

the processing means being further adapted to: 

20 

segment the input signal into consecutive signal frames of time duration, T^^.and 
generate respective feature vectors, 0,, representing predetermined signal features of the 
consecutive signal frames, 

25 process the feature vectors with at least one Hidden Markov Model, X, = {A^ b(O t ), a s „}. 
associated with a predetermined sound source to determine an elesment value or values 
of a classification vector indicating a probability of the predetermined sound source being 
active in the current listening environment, 

30 control one or several values of the plurality of related algorithm parameters in 
dependence of a determined element value or values of the classification vector, 

thereby adapting characteristics of the main processing algorithm to the current listening 
environment. 
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A s = A state transition probability matrix; 

b(0.) = Probability function for an input observation O, for each state of the at least one 
Hidden Markov Model; 

a s 0 = An initial state probability distribution vector. 

5 

2. A hearing prosthesis according to claim 1, wherein the processing means are adapted 

to: 

compare each of the respective feature vectors, O,. with a plurality of predetermined 
10 feature vectors to determine, for substantially each respective feature vector, an 
associated symbol value so as to generate an observation sequence of symbol values 
associated with the consecutive signal frames, 

process the observation sequence of symbol values with at least one discrete Hidden 
15 Markov Model. = {As, B s . <x s 0 }, associated with the predetermined sound source to 
determine the element value or values of the classification vector, 

B s = An observation symbol probability distribution matrix. 

20 3. A hearing prosthesis according to claim 1 or 2, wherein the the processing means are 
adapted to process the feature vectors with a plurality of Hidden Markov Models, or 
process the observation sequence of symbol values vectors with a plurality of discrete 
Hidden Markov Models, 

25 each of the discrete Hidden Markov Models or the Hidden Markov Models being 

associated with respective predetermined sound sources to determine the element values 
of the classification vector, each element value indicating a probability of a respective 
predetermined sound source being active in the current listening environment. 

30 4. A hearing prosthesis according to any of the preceding claims, wherein the value of 
Tfcame is between 1 millisecond to 100 milliseconds, such as about 5-10 milliseconds. 

5. A hearing prosthesis according to claim 3, wherein at least some of the plurality of 
Hidden Markov Models are adapted to model respective predetermined sound sources 
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selected from the group consisting of: {speech source, traffic noise source, multi-talker o 
babble source, subway noise, transient noise, wind noise}. 

6. A hearing prosthesis according to any of the preceding claims, wherein each of the 
5 respective feature vectors comprises a plurality of frequency-domain parameters 

representing the predetermined signal features of the consecutive signal frames. 

7. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of time-domain parameters representing 

10 the predetermined signal features of the consecutive signal frames. 

8. A hearing prosthesis according to any of the preceding claims, wherein each of the 
respective feature vectors comprises a plurality of cepstrum parameters representing the 
predetermined signal features of the consecutive signal frames. 

15 

9. A hearing prosthesis according to any of claims 2- 8, wherein the plurality of 
predetermined feature vectors have been determined in an off-line training procedure 
which utilised real-life sound source recordings and stored in non-volatile memory 
locations of the hearing instrument. 

20 

1 0. A hearing prosthesis according to any of claim 9, wherein the real life sound 
recordings have been applied to an input signal path of a target hearing prosthesis or by 
performing an equivalent signal processing of the input signal to simulate characteristics 
of the input signal path. 

25 

1 1. A hearing prosthesis according to any of claims 2-10, wherein the each of the 
respective feature vectors is associated with an integer symbol value. 

12. A hearing prosthesis according to any of the preceding claims, wherein the Hidden 
30 Markov Model or Models comprise at least one ergodic Hidden Markov Model. 

13. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Markov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 2 and 1 0 states such as between 3-8 states. 
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14. A hearing prosthesis according to any of the preceding claims, wherein the at least 
one predetermined Hidden Markov Model or each of the plurality of predetermined Hidden 
Markov Models comprises between 8 and 256 discrete symbols such as 32 - 64 discrete 
symbols 

5 

15. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means are adapted to process the input signal in accordance at least two different 
predetermined signal processing algorithms, each being associated with a respective set 
of algorithm parameters, 

10 

the processing means being further adapted to control a switching between the at least 
two predetermined signal processing algorithms in dependence of the determined 
element value or values of the classification vector. 

15 1 6. A hearing prosthesis according to claim 1 5, wherein the processing means are 

adapted to process two input signals from two respective omni-directional microphones by 
a first main signal processing algorithm with a first set of set of algorithm parameters, and 

adapted to process the two input signals by a second main signal processing algorithm 
20 with a second set of set of algorithm parameters, 

the processing means being adapted to control a transition between the first and second 
main signal processing algorithms in dependence of the determined element value or 
values of the classification vector. 

25 

17. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
further comprises a decision controller adapted to monitor the elements of the 
classification vector and control transitions between the plurality of Hidden Markov Models 
in accordance with a predetermined set of rules. 

30 

18. A hearing prosthesis according to any of claims 3-16, wherein the processing means 
are adapted to process the observation sequence of symbol values or the respective 
feature vectors with a first set of Hidden Markov Models, X^, operating at a first time scale 
and associated with a first set of predetermined sound sources to determine element 

35 values of a first classification vector, and 
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adapted to process the first classification vector with a second set of Hidden Markov 
Models, A*2 ( operating at a second time scale and associated with a second set of 
predetermined sound sources to determine element values of a second classification 
5 vector. 

19. A hearing prosthesis according to any of claims 3-16, wherein the first time scale is 
selected within the range 10 - 100 ms and the second time scale is selected within the 
range 1 - 60 seconds. 

10 

20. A hearing prosthesis according to any of the preceding claims, wherein the processing 
means comprises a software programmable processor executing the main processing 
algorithm from a program RAM area and further comprising a data RAM for storage of the 
plurality of related algorithm parameters or the sets of respective algorithm parameters. 

15 

21. A method of generating automatic classification of input signals in a hearing 
prosthesis, the method comprising the steps of: 

receiving an acoustic signal from a listening environment by a microphone of the hearing 
20 prosthesis to generate an input signal, 

processing the input signal in accordance with a main signal processing algorithm and a 
plurality of related algorithm parameters stored in a memory area to generate a processed 
output signal, 

25 

segmenting the input signal into consecutive signal frames of time duration, T framei 

generating respective feature vectors, O tf representing predetermined signal features of 
the consecutive signal frames, 

30 

processing the feature vectors with at least one Hidden Markov Model, X* = {As, b(O t ), 
a s 0 }, associated with a predetermined sound source to determine an element value or 
values of a classification vector indicating a probability of the predetermined sound source 
being active in the current listening environment, 
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controlling one or several values of the plurality of related algorithm parameters in 
dependence of a determined element value or values of the classification vector to control 
characteristics of the processed output signal, 

5 converting the processed output signal into an electrical or an acoustic output signal or 
signals by one or several output transducers, 

thereby adapting characteristics of the main processing algorithm to the current listening 
environment, 

10 

A s = A state transition probability matrix; 

b(O t ) = Probability function for the observation O t for each state of the at least one Hidden 
Markov Model; 

a s 0 = An initial state probability distribution vector. 

15 

22. A method according to claim 21 , comprising the steps of: 

comparing each of the respective feature vectors, O t , with a plurality of predetermined 
feature vectors, 

20 

determining, for substantially each respective feature vector, an associated symbol value 
so as to generate an observation sequence of symbol values associated with the 
consecutive signal frames, 

25 processing the observation sequence of symbol values with at least one discrete Hidden 
Markov Model, = {A s , B 5 , a s 0 }, associated with the predetermined sound source to 
determine the element value or values of the classification vector, 

B s = An observation symbol probability distribution matrix. 

30 

23. A method according to claim 21 or 22, wherein the processor is adapted to process 
the feature vectors with a plurality of Hidden Markov Models, or process the observation 
sequence of symbol values vectors with a plurality of discrete Hidden Markov Models, 
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each of the discrete Hidden Markov Models or the Hidden Markov Models being 
associated with respective predetermined sound sources to determine the element values 
of the classification vector, each element value indicating a probability of the respective 
predetermined sound source being active in the current listening environment. 
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